673 research outputs found
SPADE4: Sparsity and Delay Embedding based Forecasting of Epidemics
Predicting the evolution of diseases is challenging, especially when the data
availability is scarce and incomplete. The most popular tools for modelling and
predicting infectious disease epidemics are compartmental models. They stratify
the population into compartments according to health status and model the
dynamics of these compartments using dynamical systems. However, these
predefined systems may not capture the true dynamics of the epidemic due to the
complexity of the disease transmission and human interactions. In order to
overcome this drawback, we propose Sparsity and Delay Embedding based
Forecasting (SPADE4) for predicting epidemics. SPADE4 predicts the future
trajectory of an observable variable without the knowledge of the other
variables or the underlying system. We use random features model with sparse
regression to handle the data scarcity issue and employ Takens' delay embedding
theorem to capture the nature of the underlying system from the observed
variable. We show that our approach outperforms compartmental models when
applied to both simulated and real data.Comment: 24 pages, 13 figures, 2 table
Consistency and convergence rate of phylogenetic inference via regularization
It is common in phylogenetics to have some, perhaps partial, information
about the overall evolutionary tree of a group of organisms and wish to find an
evolutionary tree of a specific gene for those organisms. There may not be
enough information in the gene sequences alone to accurately reconstruct the
correct "gene tree." Although the gene tree may deviate from the "species tree"
due to a variety of genetic processes, in the absence of evidence to the
contrary it is parsimonious to assume that they agree. A common statistical
approach in these situations is to develop a likelihood penalty to incorporate
such additional information. Recent studies using simulation and empirical data
suggest that a likelihood penalty quantifying concordance with a species tree
can significantly improve the accuracy of gene tree reconstruction compared to
using sequence data alone. However, the consistency of such an approach has not
yet been established, nor have convergence rates been bounded. Because
phylogenetics is a non-standard inference problem, the standard theory does not
apply. In this paper, we propose a penalized maximum likelihood estimator for
gene tree reconstruction, where the penalty is the square of the
Billera-Holmes-Vogtmann geodesic distance from the gene tree to the species
tree. We prove that this method is consistent, and derive its convergence rate
for estimating the discrete gene tree structure and continuous edge lengths
(representing the amount of evolution that has occurred on that branch)
simultaneously. We find that the regularized estimator is "adaptive fast
converging," meaning that it can reconstruct all edges of length greater than
any given threshold from gene sequences of polynomial length. Our method does
not require the species tree to be known exactly; in fact, our asymptotic
theory holds for any such guide tree.Comment: 34 pages, 5 figures. To appear on The Annals of Statistic
A Generalization Bound of Deep Neural Networks for Dependent Data
Existing generalization bounds for deep neural networks require data to be
independent and identically distributed (iid). This assumption may not hold in
real-life applications such as evolutionary biology, infectious disease
epidemiology, and stock price prediction. This work establishes a
generalization bound of feed-forward neural networks for non-stationary
-mixing data
- …